Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data

نویسندگان

Georgios Balikas

Ioannis Partalas

Éric Gaussier

Rohit Babbar

Massih-Reza Amini

چکیده

Hyper-parameter tuning is a resource-intensive task when optimizing classification models. The commonly used k-fold cross validation can become intractable in large scale settings when a classifier has to learn billions of parameters. At the same time, in real-world, one often encounters multi-class classification scenarios with only a few labeled examples; model selection approaches often offer little improvement in such cases and the default values of learners are used. We propose bounds for classification on accuracy and macro measures (precision, recall, F1) that motivate efficient schemes for model selection and can benefit from the existence of unlabeled data. We demonstrate the advantages of those schemes by comparing them with k-fold cross validation and hold-out estimation in the setting of large scale classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

کاهش ابعاد داده‌های ابرطیفی به منظور افزایش جدایی‌پذیری کلاس‌ها و حفظ ساختار داده

Hyperspectral imaging with gathering hundreds spectral bands from the surface of the Earth allows us to separate materials with similar spectrum. Hyperspectral images can be used in many applications such as land chemical and physical parameter estimation, classification, target detection, unmixing, and so on. Among these applications, classification is especially interested. A hyperspectral im...

متن کامل

Manifold-Regularized Selectable Factor Extraction for Semi-supervised Image Classification

Feature selection methods are efficient in modern computer vision applications to reduce the computational cost and the chance of over-fitting. Recently, a novel selectable factor extraction (SFE[3]) framework is proposed to simultaneously perform feature selection and extraction, and is theoretically and practically proved to be effective for high-dimensional data. Although it is advantageous ...

متن کامل

Evaluation and ranking of suppliers with fuzzy DEA and PROMETHEE approach

Supplier selection is a multi-Criteria problem. This study proposes a hybrid model for supporting the suppliers’ selection and ranking. This research is a two-stage model designed to fully rank the suppliers where each supplier has multiple Inputs and Outputs. First, the supplier evaluation problem is formulated by Data Envelopment Analysis (DEA), since the regarded decision deals with uncertai...

متن کامل

Exploiting Ontology Structures and Unlabeled Data for Learning

We present and analyze a theoretical model designed to understand and explain the effectiveness of ontologies for learning multiple related tasks from primarily unlabeled data. We present both information-theoretic results as well as efficient algorithms. We show in this model that an ontology, which specifies the relationships between multiple outputs, in some cases is sufficient to completely...

متن کامل

Semi-supervised manifold learning approaches for spoken term verification

In this paper, the application of semi-supervised manifold learning techniques to the task of verifying hypothesized occurrences of spoken terms is investigated. These techniques are applied in a two stage spoken term detection framework where ASR lattices are first generated using a large vocabulary ASR system and hypothesized occurrences of spoken query terms in the lattices are verified in a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data

نویسندگان

چکیده

منابع مشابه

کاهش ابعاد داده‌های ابرطیفی به منظور افزایش جدایی‌پذیری کلاس‌ها و حفظ ساختار داده

Manifold-Regularized Selectable Factor Extraction for Semi-supervised Image Classification

Evaluation and ranking of suppliers with fuzzy DEA and PROMETHEE approach

Exploiting Ontology Structures and Unlabeled Data for Learning

Semi-supervised manifold learning approaches for spoken term verification

عنوان ژورنال:

اشتراک گذاری